Mapping human lymph node cell types to 10X Visium - estimating reference expression signatures

Open In Colab

Cell2location maps cell types by integrating single cell/nucleus and spatial transcriptomics data. This is achieved by estimating which combination of cell types in which cell abundance could have given the mRNA counts in the spatial data, taking technical effects into account (platform/technology effect, contaminating RNA, unexplained variance).

Given cell type annotation for each cell, the corresponding reference cell type signatures $g_{f,g}$, which represent the average mRNA count of each gene $g$ in each cell type $f={1, .., F}$, can be estimated from sc/snRNA-seq data using 2 provided methods (see below). Cell2location needs untransformed unnormalised spatial mRNA counts as input. You also need to provide cell2location with the expected average cell abundance per location which is used as a prior to guide estimation of absolute cell abundance. This value depends on the tissue and can be estimated by counting nuclei for a few locations in the paired histology image but can be approximate (see paper methods for more guidance).

We provide 2 methods for estimating reference expression signatures of cell types from scRNA-seq data:

1) a statistical method based on Negative Binomial regression. We generally recommend using NB regression, which allows to robustly combine data across technologies and batches, which results in improved spatial mapping accuracy. This notebook shows use a dataset composed on multiple batches and technologies to estimate that.

2) hard-coded computation of per-cluster average mRNA counts for individual genes (scvi.external.cell2location.compute_cluster_averages). When the batch effects are small, this faster hard-coded method of computing per cluster averages provides similarly high accuracy. We also recommend the hard-coded method for non-UMI technologies such as Smart-Seq 2.

Contents:

Loading packages

Loading Visium data

First let's read spatial Visium data from 10X Space Ranger output.

Estimation of reference cell type expression signatures

The signatures are estimated from scRNA-seq data, accounting for batch effect, using a Negative binomial regression model.

Examine QC plots

  1. Reconstruction accuracy to assess if there are any issues with inference.

  2. The estimated expression signatures are distinct from mean expression in each cluster because of batch effects. For scRNA-seq datasets which do not suffer from batch effect (this dataset does), cluster average expression can be used instead of estimating signatures with a model. When this plot is very different from a diagonal plot (e.g. very low values on Y-axis, density everywhere) it indicates problems with signature estimation.

Train scvi-cell2location

mod = scvi.external.cell2location.Cell2location.load(f"{scvi_run_name}_c2l", adata_vis) adata_file = f"{scvi_run_name}_c2l/sp.h5ad" adata_vis = sc.read_h5ad(adata_file)

Plot cell abundance in spatial coordinates

Perform clustering of cell abudance estimates to identify tissue regions

We find regions by clustering locations/spots (Leiden) based on estimated cell abundance of each cell type. Results are saved in adata_vis.obs['region_cluster'].

Advanced use examples

Modules and their versions used for this analysis

from session_info import session_info session_info()